Goto

Collaborating Authors

 distance query


Testing properties of trees in graphical models with covariance queries

arXiv.org Machine Learning

We consider the problem of testing properties of graphs underlying high-dimensional graphical models. We adopt the model of covariance queries introduced by Lugosi, Truszkowski, Velona, and Zwiernik (2021). We study the case when the underlying graph is a tree. The main results of the paper show that, while reconstructing the entire tree may be costly, certain global structural properties can be tested efficiently. In particular, we design randomized tests for global structural properties that use a sub-quadratic number of queries. We develop testing procedures for several fundamental properties, including the number of leaves, the maximum degree, the typical distance, and the diameter of the tree. For each property, we obtain explicit query complexity bounds that depend on the target threshold and tolerance parameters.


Efficiently Computing Similarities to Private Datasets

arXiv.org Artificial Intelligence

Many methods in differentially private model training rely on computing the similarity between a query point (such as public or synthetic data) and private data. We abstract out this common subroutine and study the following fundamental algorithmic problem: Given a similarity function $f$ and a large high-dimensional private dataset $X \subset \mathbb{R}^d$, output a differentially private (DP) data structure which approximates $\sum_{x \in X} f(x,y)$ for any query $y$. We consider the cases where $f$ is a kernel function, such as $f(x,y) = e^{-\|x-y\|_2^2/\sigma^2}$ (also known as DP kernel density estimation), or a distance function such as $f(x,y) = \|x-y\|_2$, among others. Our theoretical results improve upon prior work and give better privacy-utility trade-offs as well as faster query times for a wide range of kernels and distance functions. The unifying approach behind our results is leveraging `low-dimensional structures' present in the specific functions $f$ that we study, using tools such as provable dimensionality reduction, approximation theory, and one-dimensional decomposition of the functions. Our algorithms empirically exhibit improved query times and accuracy over prior state of the art. We also present an application to DP classification. Our experiments demonstrate that the simple methodology of classifying based on average similarity is orders of magnitude faster than prior DP-SGD based approaches for comparable accuracy.


UrbanFly: Uncertainty-Aware Planning for Navigation Amongst High-Rises with Monocular Visual-Inertial SLAM Maps

arXiv.org Artificial Intelligence

We present UrbanFly: an uncertainty-aware real-time planning framework for quadrotor navigation in urban high-rise environments. A core aspect of UrbanFly is its ability to robustly plan directly on the sparse point clouds generated by a Monocular Visual Inertial SLAM (VINS) backend. It achieves this by using the sparse point clouds to build an uncertainty-integrated cuboid representation of the environment through a data-driven monocular plane segmentation network. Our chosen world model provides faster distance queries than the more common voxel-grid representation, and UrbanFly leverages this capability in two different ways leading to two trajectory optimizers. The first optimizer uses a gradient-free cross-entropy method to compute trajectories that minimize collision probability and smoothness cost. Our second optimizer is a simplified version of the first and uses a sequential convex programming optimizer initialized based on probabilistic safety estimates on a set of randomly drawn trajectories. Both our trajectory optimizers are made computationally tractable and independent of the nature of underlying uncertainty by embedding the distribution of collision violations in Reproducing Kernel Hilbert Space. Empowered by the algorithmic innovation, UrbanFly outperforms competing baselines in metrics such as collision rate, trajectory length, etc., on a high-fidelity AirSim simulator augmented with synthetic and real-world dataset scenes.


Efficient Top-k Shortest-Path Distance Queries on Large Networks by Pruned Landmark Labeling

AAAI Conferences

We propose an indexing scheme for top-k shortest-path distance queries on graphs, which is useful in a wide range of important applications such as network-aware search and link prediction. While considerable effort has been made for efficiently answering standard (top-1) distance queries, none of previous methods can be directly extended for top-k distance queries. We propose a new framework for top-k distance queries based on 2-hop cover and then present an efficient indexing algorithm based on the simple but effective recent notion of pruned landmark labeling. Extensive experimental results on real social and web graphs show the scalability, efficiency and robustness of our method. Moreover, we demonstrate the usefulness of top-k distance queries through an application to link prediction.


Heuristic-Aided Compressed Distance Databases

AAAI Conferences

Answering point-to-point distance queries is important inmany applications, including games, robotics and vehiclerouting in operations research. Searching in a graph to answer distance queries on demandcan often be too slow.An alternative strategy, taken in methods such asTransit and Hub Labels, is to pre-compute information that can help computedistances much faster.To be practical, such methods need to generate muchless preprocessed data than a naive all-pairs distance table. We present Heuristic-Aid Compressed Distance Databases (HCDs),pre-computed data structures based on the observation thatheuristic distance estimations can sometimes coincide with true distances.Compared to a naive all-pairs distance table,we report compression factors of two to three orders of magnitude in a wide range ofmaps, reducing the memory usage to a reasonable size. Comparedto compressed path databases, our approachgenerally generates smaller databases, and answers query distances faster.


Efficient Clustering with Limited Distance Information

arXiv.org Artificial Intelligence

Given a point set S and an unknown metric d on S, we study the problem of efficiently partitioning S into k clusters while querying few distances between the points. In our model we assume that we have access to one versus all queries that given a point s 2 S return the distances between s and all other points. We show that given a natural assumption about the structure of the instance, we can efficiently find an accurate clustering using only O(k) distance queries. We use our algorithm to cluster proteins by sequence similarity. This setting nicely fits our model because we can use a fast sequence database search program to query a sequence against an entire dataset. We conduct an empirical study that shows that even though we query a small fraction of the distances between the points, we produce clusterings that are close to a desired clustering given by manual classification.


TRANSIT Routing on Video Game Maps

AAAI Conferences

TRANSIT is a fast and optimal technique for computing shortest path costs in road networks. It is attractive for its usually modest memory requirements and impressive running times. In this paper we give a first analysis of TRANSIT routing on a set of popular grid-based video-game benchmarks taken from the AI pathfinding literature. We show that in the presence of path symmetries, which are inherent to most grids but normally not road networks, TRANSIT is strongly and negatively impacted, both in terms of performance and memory requirements. We address this problem by developing a new general symmetry breaking technique which adds small random epsilon-values to edges in the search graph, reducing the size of the TRANSIT network by up to 4 times while preserving optimality. Using our enhancements TRANSIT achieves up to four orders of magnitude speed improvement vs. A* search and uses in many cases only a small (<=10MB) or modest (<= 50MB) amount of memory. We also compare TRANSIT with CPDs, a recent and very fast database-driven pathfinding approach. We find the algorithms have complementary strengths but also identify a class of problems for which TRANSIT is up to two orders of magnitude faster than CPDs using a comparable amount of memory.